Fast multivariate empirical cumulative distribution function with connection to kernel density estimation

نویسندگان

چکیده

This paper revisits the problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets. Computing an ECDF at one evaluation point requires $\mathcal{O}(N)$ operations a dataset composed $N$ data points. Therefore, direct ECDFs points quadratic $\mathcal{O}(N^2)$ operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed compared. The first based summation in lexicographical order, with $\mathcal{O}(N{\log}N)$ complexity to lie regular grid. second divide-and-conquer principle, $\mathcal{O}(N\log(N)^{(d-1){\vee}1})$ coincide input two algorithms described detailed general $d$-dimensional case, numerical experiments validate their speed accuracy. Secondly, establishes connection between kernel density estimation (KDE) large class kernels. paves way regression. Numerical tests Laplacian accuracy algorithms. A broad range estimation, survival function regression problems can benefit from methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel estimation of multivariate cumulative distribution function

A smooth kernel estimator is proposed for multivariate cumulative distribution functions (cdf), extending the work of Yamato [H. Yamato, Uniform convergence of an estimator of a distribution function, Bull. Math. Statist. 15 (1973), pp. 69–78.] on univariate distribution function estimation. Under assumptions of strict stationarity and geometrically strong mixing, we establish that the proposed...

متن کامل

Efficient Estimation of the Density and Cumulative Distribution Function of the Generalized Rayleigh Distribution

The uniformly minimum variance unbiased (UMVU), maximum likelihood, percentile (PC), least squares (LS) and weighted least squares (WLS) estimators of the probability density function (pdf) and cumulative distribution function are derived for the generalized Rayleigh distribution. This model can be used quite effectively in modelling strength data and also modeling general lifetime data. It has...

متن کامل

Memory-Effcient Orthogonal Least Squares Kernel Density Estimation using Enhanced Empirical Cumulative Distribution Functions

A novel training algorithm for sparse kernel density estimates by regression of the empirical cumulative density function (ECDF) is presented. It is shown how an overdetermined linear least-squares problem may be solved by a greedy forward selection procedure using updates of the orthogonal decomposition in an order-recursive manner. We also present a method for improving the accuracy of the es...

متن کامل

Fast and Extensible Online Multivariate Kernel Density Estimation

In this paper we present xokde++, a state-of-the-art online kernel density estimation approach that maintains Gaussian mixture models input data streams. The approach follows state-of-the-art work on online density estimation, but was redesigned with computational efficiency, numerical robustness, and extensibility in mind. Our approach produces comparable or better results than the current sta...

متن کامل

Empirical Testing of Fast Kernel Density Estimation Algorithms

We present results of experiments testing the Fast Gauss Transform, Improved Fast Gauss Transform, and Dual-Tree methods (using kd-tree and Anchors Hierarchy data structures) for fast Kernel Density Estimation (KDE). We examine the performance of these methods with respect to data set size, dimension, allowable error, and data set structure (“clumpiness”), measured in terms of CPU time and memo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Statistics & Data Analysis

سال: 2021

ISSN: ['0167-9473', '1872-7352']

DOI: https://doi.org/10.1016/j.csda.2021.107267